Identifying the potential of Near Data Computing for Apache Spark

نویسندگان

  • Ahsan Javed Awan
  • Mats Brorsson
  • Vladimir Vlassov
  • Eduard Ayguadé
چکیده

While cluster computing frameworks are continuously evolving to provide real-time data analysis capabilities, Apache Spark has managed to be at the forefront of big data analytics for being a unified framework for both, batch and stream data processing. There is also a renewed interest is Near Data Computing (NDC) due to technological advancement in the last decade. However, it is not known if NDC architectures can improve the performance of big data processing frameworks such as Apache Spark. In this position paper, we hypothesize in favour of NDC architecture comprising programmable logic based hybrid 2D integrated processingin-memory and in-storage processing for Apache Spark, by extensive profiling of Apache Spark based workloads on Ivy Bridge Server.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Behavior-Based Online Anomaly Detection for a Nationwide Short Message Service

As fraudsters understand the time window and act fast, real-time fraud management systems becomes necessary in Telecommunication Industry. In this work, by analyzing traces collected from a nationwide cellular network over a period of a month, an online behavior-based anomaly detection system is provided. Over time, users' interactions with the network provides a vast amount of usage data. Thes...

متن کامل

A Reference Architecture and Road map for Enabling E- commerce on Apache Spark

Apache Spark is an execution engine that besides working as an isolated distributed, in-memory computing engine also offers close integration with Hadoop’s distributed file system (HDFS). Apache Spark's underlying appeal is in providing a unified framework to create sophisticated applications involving workloads. It unifies multiple workloads, handles unstructured data very well and has easy-to...

متن کامل

On the usability of Hadoop MapReduce, Apache Spark & Apache flink for data science

Distributed data processing platforms for cloud computing are important tools for large-scale data analytics. Apache Hadoop MapReduce has become the de facto standard in this space, though its programming interface is relatively low-level, requiring many implementation steps even for simple analysis tasks. This has led to the development of advanced dataflow oriented platforms, most prominently...

متن کامل

Novel Apache Spark based Algorithm to Solve Dirichlet Problem for Poisson Equation in 3D Computational Domain

Corresponding Author: Shomanov Aday Department of Computer Science, al-Farabi Kazakh National University, Almaty, Kazakhstan Email: [email protected] Abstract: Parallel computations are essential tool in solving large-scale computationally demanding problems. Due to large diversity and heterogeneity of the currently available parallel processing techniques and paradigms it is usually diff...

متن کامل

Dynamic Multi-Objective Optimization with jMetal and Spark: A Case Study

Technologies for Big Data and Data Science are receiving increasing research interest nowadays. This paper introduces the prototyping architecture of a tool aimed to solve Big Data Optimization problems. Our tool combines the jMetal framework for multi-objective optimization with Apache Spark, a technology that is gaining momentum. In particular, we make use of the streaming facilities of Spark...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1707.09323  شماره 

صفحات  -

تاریخ انتشار 2017